Warning: this paper contains content that may be offensive or upsetting. Considering the large amount of content created online by the minute, slang-aware automatic tools are critically needed to promote social good, and assist policymakers and moderators in restricting the spread of offensive language, abuse, and hate speech. Despite the success of large language models and the spontaneous emergence of slang dictionaries, it is unclear how far their combination goes in terms of slang understanding for downstream social good tasks. In this paper, we provide a framework to study different combinations of representation learning models and knowledge resources for a variety of downstream tasks that rely on slang understanding. Our experiments show the superiority of models that have been pre-trained on social media data, while the impact of dictionaries is positive only for static word embeddings. Our error analysis identifies core challenges for slang representation learning, including out-of-vocabulary words, polysemy, variance, and annotation disagreements, which can be traced to characteristics of slang as a quickly evolving and highly subjective language.
translated by 谷歌翻译
Warning: this paper contains content that may be offensive or upsetting. In the current context where online platforms have been effectively weaponized in a variety of geo-political events and social issues, Internet memes make fair content moderation at scale even more difficult. Existing work on meme classification and tracking has focused on black-box methods that do not explicitly consider the semantics of the memes or the context of their creation. In this paper, we pursue a modular and explainable architecture for Internet meme understanding. We design and implement multimodal classification methods that perform example- and prototype-based reasoning over training cases, while leveraging both textual and visual SOTA models to represent the individual cases. We study the relevance of our modular and explainable models in detecting harmful memes on two existing tasks: Hate Speech Detection and Misogyny Classification. We compare the performance between example- and prototype-based methods, and between text, vision, and multimodal models, across different categories of harmfulness (e.g., stereotype and objectification). We devise a user-friendly interface that facilitates the comparative analysis of examples retrieved by all of our models for any given meme, informing the community about the strengths and limitations of these explainable methods.
translated by 谷歌翻译
Neural language models (LMs) have achieved impressive results on various language-based reasoning tasks by utilizing latent knowledge encoded in their own pretrained parameters. To make this reasoning process more explicit, recent works retrieve a rationalizing LM's internal knowledge by training or prompting it to generate free-text rationales, which can be used to guide task predictions made by either the same LM or a separate reasoning LM. However, rationalizing LMs require expensive rationale annotation and/or computation, without any assurance that their generated rationales improve LM task performance or faithfully reflect LM decision-making. In this paper, we propose PINTO, an LM pipeline that rationalizes via prompt-based learning, and learns to faithfully reason over rationales via counterfactual regularization. First, PINTO maps out a suitable reasoning process for the task input by prompting a frozen rationalizing LM to generate a free-text rationale. Second, PINTO's reasoning LM is fine-tuned to solve the task using the generated rationale as context, while regularized to output less confident predictions when the rationale is perturbed. Across four datasets, we show that PINTO significantly improves the generalization ability of the reasoning LM, yielding higher performance on both in-distribution and out-of-distribution test sets. Also, we find that PINTO's rationales are more faithful to its task predictions than those generated by competitive baselines.
translated by 谷歌翻译
程序文本理解是一项具有挑战性的语言推理任务,需要模型在整个叙事的发展中跟踪实体状态。完整的程序理解解决方案应结合三个核心方面:输入的本地和全局视图,以及对输出的全局视图。先前的方法考虑了这些方面的一个子集,导致精度低或低召回率。在本文中,我们提出了合并的全球和本地信息(CGLI),该新模型构建了实体和时间段意识到的输入表示(本地输入),考虑了整个上下文(全球输入),我们将实体与实体共同模拟实体状态。结构化预测目标(全局输出)。因此,CGLI同时优化了精度和回忆。我们使用其他输出层扩展了CGLI,并将其集成到故事推理框架中。关于流行的程序文本理解数据集的广泛实验表明,我们的模型可以实现最新的结果;故事推理基准的实验显示了我们模型对下游推理的积极影响。
translated by 谷歌翻译
大型公共知识图,例如Wikidata,包含数千万实体的数十亿个陈述,从而激发了各种用例以利用此类知识图。但是,实践表明,Wikidata中仍然缺少适合用户需求的许多相关信息,而当前的链接开放数据(LOD)工具不适合丰富像Wikidata这样的大图。在本文中,我们研究了从LOD云中用结构化数据源丰富Wikidata的潜力。我们提出了一个新颖的工作流程,其中包括差距检测,源选择,模式对齐和语义验证。我们用两个互补的LOD来源评估了我们的富集方法:一个嘈杂的源,具有广泛的覆盖范围,DBPEDIA和一个手动策划的来源,对艺术领域,Getty的关注狭窄。我们的实验表明,我们的工作流程可以通过高质量的外部LOD来源来丰富Wikidata。财产一致性和数据质量是关键挑战,而实体对齐和源选择是由现有的Wikidata机制良好支持的。我们提供代码和数据以支持未来的工作。
translated by 谷歌翻译
类比推理是一种强大的定性推理工具,使人类能够连接两种情况,并从熟悉的情况下概括他们的知识。认知科学研究为类似推理的丰富性和复杂性提供了宝贵的见解,以及具有有限可扩展性的表达性类似推理者的实施。现代可伸缩的AI技术具有类比的推理潜力,仅应用于比例类比的特殊情况,而不是理解高阶类比。在本文中,我们旨在通过:1)基于认知科学研究的成熟见解的六个比喻正式化的差距,2)用这些维度中的每个维度注释寓言的寓言,以及3)定义四个任务,并增加了复杂性的增加。可以对AI技术进行可扩展的评估。在这些任务上使用语言模型和神经符号AI推理的实验表明,可以通过类比来实现最先进的方法,以有限的成功,激发了对AI进行进一步研究的进一步研究。我们使所有代码和数据可用。
translated by 谷歌翻译
人类使用自然语言来撰写普通概念,将他们的环境归结为合理的日常场景描述。然而,这种生成的致辞推理(GCSR)技能缺乏最先进的文本生成方法。关于由神经文本生成模型(例如,预先接受的文本到文本变压器)生成的任意概念的描述性句子通常是语法流畅的,但可能与人类常识不相符,这主要是由于它们缺乏捕获概念关系的机制识别隐式概念,并对看不见的概念组成来执行概括的推理。在本文中,我们提出了一种想象的 - 言语(I&V)方法,其学会在输入概念之间的关系中想象一个关系场景知识图(SKG),并在生成合理的场景描述时利用SKG作为约束。我们收集和协调来自不同领域和方式的一套知识资源,为I&v提供丰富的辅助监督信号。该实验展示了I&V在提高概念到句子和概念到故事的生成任务上的语言模型的有效性,同时使模型能够从更少的任务示例中学习并生成对人类注入者常识的SKG。
translated by 谷歌翻译
Wikidata越来越多地通过许多社区进行各种各样的应用,这需要高质量的知识来提供成功的结果。在本文中,我们制定了一个框架,以通过在社区行使的当前实践中脱灯来检测和分析Wikidata中的低质量陈述。我们探讨Wikidata的三个数据质量指标,基于:1)对目前录制知识的社区共识,假设已被删除并未添加的陈述被隐含地同意低质量;2)已弃用的陈述;3)数据中的约束违规。我们将这些指标结合起来检测低质量陈述,揭示了重复实体,缺少三元,违反类型规则和分类学区分的挑战。我们的研究结果补充了Wikidata社区的持续努力,以提高数据质量,旨在使用户和编辑更容易找到和纠正错误。
translated by 谷歌翻译
Efficient and robust control using spiking neural networks (SNNs) is still an open problem. Whilst behaviour of biological agents is produced through sparse and irregular spiking patterns, which provide both robust and efficient control, the activity patterns in most artificial spiking neural networks used for control are dense and regular -- resulting in potentially less efficient codes. Additionally, for most existing control solutions network training or optimization is necessary, even for fully identified systems, complicating their implementation in on-chip low-power solutions. The neuroscience theory of Spike Coding Networks (SCNs) offers a fully analytical solution for implementing dynamical systems in recurrent spiking neural networks -- while maintaining irregular, sparse, and robust spiking activity -- but it's not clear how to directly apply it to control problems. Here, we extend SCN theory by incorporating closed-form optimal estimation and control. The resulting networks work as a spiking equivalent of a linear-quadratic-Gaussian controller. We demonstrate robust spiking control of simulated spring-mass-damper and cart-pole systems, in the face of several perturbations, including input- and system-noise, system disturbances, and neural silencing. As our approach does not need learning or optimization, it offers opportunities for deploying fast and efficient task-specific on-chip spiking controllers with biologically realistic activity.
translated by 谷歌翻译
Recent advances in language modeling have enabled new conversational systems. In particular, it is often desirable for people to make choices among specified options when using such systems. We address the problem of reference resolution, when people use natural expressions to choose between real world entities. For example, given the choice `Should we make a Simnel cake or a Pandan cake?' a natural response from a non-expert may be indirect: `let's make the green one'. Reference resolution has been little studied with natural expressions, thus robustly understanding such language has large potential for improving naturalness in dialog, recommendation, and search systems. We create AltEntities (Alternative Entities), a new public dataset of entity pairs and utterances, and develop models for the disambiguation problem. Consisting of 42K indirect referring expressions across three domains, it enables for the first time the study of how large language models can be adapted to this task. We find they achieve 82%-87% accuracy in realistic settings, which while reasonable also invites further advances.
translated by 谷歌翻译